Note: This analysis presents results of the 2021 Stack Overflow Developer Survey focusing on the feedback of 8193 US-based respondents. As this analysis will set its focus on questions around remuneration, any outliers above the 1.5 IQR boundary are removed. Generally, only complete survey feedbacks are considered.

Survey Participants

In which state or territory of the USA do you live?

The Majority of the US-based respondents live in California (988) followed by Texas (575), New York (488) and Washington (487).

What is your age?

Three out of four survey participants are between 25 and 44 years old.

Which of the following describe you, if any?

With regards to gender the interviewed population shows a clear imbalance and as female particpants are only represented with 6.35 %.

Which of the following options best describes you today? Here, by “developer” we mean “someone who writes code.”

The vast majority of participants are professional developers.

Which of the following describes your current job?

The largest share considers themselve as Full-Stack Developers, while the specific job roles are manifold.

Which of the following best describes your current employment status?

With regards to employment status about 95% declared to work for an employer while freelancers and independent workers are a minority.

Which of the following best describes the highest level of formal education that you’ve completed?

80% of the respondents at least hold a bachelor’s or any higher degree.

How did you learn to code?

School, books, and online resources are named as most frequent ways to learn coding.

At what age did you write your first line of code or program?

More than half of all respondents claim to have written their first line during their adolescence.

What do you do when you get stuck on a problem?

Most of the participants find the answers to their problems through Google and Stack Overflow. Just as frequently they would suggest to do a break and come back to the problem with a fresh mind.

How frequently would you say you visit Stack Overflow?

Almost every second participant visits Stack Overflow on a daily basis.

Analysis of Annual Compensation Figures

Randomized Sampling (Central Limit Theorem)

To gain a deeper understanding about the distribution of the salaries and to prove the Central Limit Theorem, random sampling is applied to generate samples of varying sizes 20,30,40 and 50. The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed (Source: Wayne W. LaMorte - Boston University School of Public Health). The sample means are computed 10,000 times. As illustrated below, with an increasing sample size, the standard deviation shrinks.

## [1] "Sample Size: 20, Mean: 128031.970000, Standard Deviation, 11329.660000"
## [2] "Sample Size: 30, Mean: 128032.480000, Standard Deviation, 9341.340000" 
## [3] "Sample Size: 40, Mean: 128171.880000, Standard Deviation, 8135.990000" 
## [4] "Sample Size: 50, Mean: 128107.030000, Standard Deviation, 7227.590000"

Alternative Sampling Techniques Annual Compensation Data

Sampling is utilized when we want to determine any patterns that can be observed within a subset of the whole data. We have decided to sample our data based on the attribute ‘US_state’ and the value used in our distribution as ‘CONVERTERCOMPYEARLY’. When we look at and compare the four different types of distributions (SRS without replacement, Systematic sampling, Inclusion probabilities, and Stratified sampling) to the population dataset as a whole.

We can see that systematic sampling, and stratified sampling generally has the same min value as the population dataset with SRS without replacement having a slightly higher min value and Inclusion probabilities having a much higher min value. All sampling has a higher q1, mean, q3, and max value compared to the population dataset, with inclusion probabilities having the highest out of the four. Comparing all four of these sampling techniques, systematic sampling is the most similar to the population dataset and hence would be the most ideal type of sampling technique used.

## [1] 811.1
## Stratum 1 
## 
## Population total and number of selected units: 968 276.5568 
## Stratum 2 
## 
## Population total and number of selected units: 352 100.5661 
## Stratum 3 
## 
## Population total and number of selected units: 475 135.7071 
## Stratum 4 
## 
## Population total and number of selected units: 571 163.1342 
## Stratum 5 
## 
## Population total and number of selected units: 473 135.1357 
## Number of strata  5 
## Total number of selected units 811.1

Relationship between Total Compensation and Years of Professional Experience

Age and salary show a moderate correlation with a Perason Coefficient of 0.3.It also appears that maximum pay limit can also be reached during an early career stage.

Pearson Correlation:

## [1] 0.3179661

Relationship between Total Compensation and Highest Education Level

Although salaries are sparse in every education level, the boxplot illustration suggests that survey participants holding a Master’s or Doctoral degree earn on average higher salaries (135k) whereas there is only a slight difference between these two groups.

Total Compensation by Size of the Organization

The size of the employer seem to play a crucial role as larger organizations are able to pay higher salaries compared to their smaller competitors.

Comparative Study - Analysis of Gender Differences

Salary Gap among male and female survey participants

The density graphs below provide evidence that the female survey participants earn less compared to their male colleagues. This reflects the findings by the U.S. Bureau of Labor Statistics (source: bls.gov/cps/earnings.htm).

Comparison of Highest Education Level

When comparing men and women with regards to their educational level, the bars below suggest that women more often hold a Master’s or Doctoral degree.

Comparison of Professional Coding Experience

The density distribution below suggests that the male survey participants have on average more years of professoinal experience, whereas women are stronger represented among the younger age classes.

Google Search Term Popularity for top Tech-Stacks since 2012

Source: The data has been retrieved from the Google Trends API using the gtrendsR package.

Programming Languages

Databases

Tools

Platforms

Machine Learning